Similarity-dissimilarity Plot for High Dimensional Data of Different Attribute Types in Biomedical Datasets

نویسندگان

  • Muhammad Arif
  • Saleh Basalamah
چکیده

In real life biomedical classification applications, feature space may be of high dimension in which visualization of class distribution is impossible. Moreover, attributes of features may be numeric, ordinal, categorical or binary. Most of the time, features may be composed of mixed type of attributes. In this paper, the concept of similarity-dissimilarity is extended to various types of attributes. Similarity-dissimilarity plot projects the high dimensional feature space on two dimensional plot revealing the class separation in the feature space which may be continuous or discrete. Furthermore, effect of various distance measures proposed in the literature for different type of attributes is also studied. An index called percentage of data points above the similarity-dissimilarity line (PAS) is proposed which is the fraction of data points found near to its own class as compared to other classes. Several real life biomedical datasets are used to show the effectiveness of the proposed similarity-dissimilarity plot and the PAS index.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Feature Subset Selection Using Similarity-Dissimilarity Index and Genetic Algorithms

Optimal feature subset selection is an important pre-processing step for classification in many real life problems where number of dimensions of feature space is large and some features are may be irrelevant or redundant. One example of such a situation is genes expression profile data to classify among normal and cancerous samples. Contribution of this paper is five folds. Similarity-dissimila...

متن کامل

Class proximity measures - Dissimilarity-based classification and display of high-dimensional data

For two-class problems, we introduce and construct mappings of high-dimensional instances into dissimilarity (distance)-based Class-Proximity Planes. The Class Proximity Projections are extensions of our earlier relative distance plane mapping, and thus provide a more general and unified approach to the simultaneous classification and visualization of many-feature datasets. The mappings display...

متن کامل

A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data

Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study th...

متن کامل

Efficient Clustering of High Dimensional Datasets with Multi Viewpoint Based Similarity Measure

Many important real time applications involve clustering large datasets. Dataset can be large if there are a large number of elements in the data set, each element can have many features and there can be many clusters to discover. Recent advances in clustering algorithms have been addressed these datasets issues partially. However, there has been much less work on methods of efficiently cluster...

متن کامل

A Geometry Preserving Kernel over Riemannian Manifolds

Abstract- Kernel trick and projection to tangent spaces are two choices for linearizing the data points lying on Riemannian manifolds. These approaches are used to provide the prerequisites for applying standard machine learning methods on Riemannian manifolds. Classical kernels implicitly project data to high dimensional feature space without considering the intrinsic geometry of data points. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011